End of Semester Presentation

8th May 2025

Laura Furtado Fernandes

Friends

Friends is a famous sitcom that aired on NBC between 1994 and 2004. It followed the lives of 6 friends living in Manhattan and become a global pop culture sensation! Friends ran for 10 years and its scripts hold so much character data which can be fun to analyse!

Project Set Up

For this project I used the following packages to examine data from the dialogue in friends episodes.

library(friends) 
library(tidyverse)
library(RColorBrewer)
data(friends)
as_tibble(friends)
# A tibble: 67,373 × 6
   text                                   speaker season episode scene utterance
   <chr>                                  <chr>    <int>   <int> <int>     <int>
 1 There's nothing to tell! He's just so… Monica…      1       1     1         1
 2 C'mon, you're going out with the guy!… Joey T…      1       1     1         2
 3 All right Joey, be nice. So does he h… Chandl…      1       1     1         3
 4 Wait, does he eat chalk?               Phoebe…      1       1     1         4
 5 (They all stare, bemused.)             Scene …      1       1     1         5
 6 Just, 'cause, I don't want her to go … Phoebe…      1       1     1         6
 7 Okay, everybody relax. This is not ev… Monica…      1       1     1         7
 8 Sounds like a date to me.              Chandl…      1       1     1         8
 9 [Time Lapse]                           Scene …      1       1     1         9
10 Alright, so I'm back in high school, … Chandl…      1       1     1        10
# ℹ 67,363 more rows

How you doin’?

One of the characters from Friends, Joey Tribbiani, is famous for his catch phrase How you doin’

Joey Tribbiani

How did I do it?

I wanted to examine how often Joey says his catchphrase and if that changes throughout the seasons. To do that, I used the following R code.

howyoudoin <- friends %>% 
  filter(speaker == "Joey Tribbiani") %>% 
  filter(str_detect(text, "(?i)How you doin'\\?")) %>% 
  group_by(season) %>% 
  summarise(howyoudoin_count = n()) %>% 
  mutate(season = as.character(season)) 

howyoudoinmerge <- data.frame(
  season = c("1", "2", "3", "10"), 
  howyoudoin_count = c(0, 0, 0, 0)
)

howyoudoin <- bind_rows(howyoudoin, howyoudoinmerge) %>% 
  mutate(season = factor(season, levels = sort(unique(as.numeric(season)))))

How each season is doin?

ggplot(howyoudoin, 
       aes(x = season, y = howyoudoin_count)) + 
  geom_col(aes(fill = season, colour = season)) + 
  labs(x = "Season", 
       y = "Number of times Joey says 'how you doin'", 
       title = "How each season of Friends is doin'") + 
  theme_minimal() + 
  theme(legend.position="none")

Smelly Friends…

Another recurring joke in the show is Phoebe’s singing. She is a notoriously bad singer, with weird song subjects and lyrics on top of that. One of her most famous songs is “Smelly Cat”.

Smelly Cat

Phoebe is also one of the most overlooked characters in the show. I wanted to see if in episodes with more mentions of “Smelly Cat” Phoebe would also have more lines.

How did I do it?

I used the following code to count how many utterances per character (6 main friends) per episode and compared that to number of smelly cat mentions per episode.

smelly_cat <- friends %>% 
  mutate(smelly_cats = str_count(text, "(?i)\\bsmelly cat\\b")) %>% 
  group_by(episode, season) %>%
  summarise(total_smelly_cats = sum(smelly_cats)) %>% 
  ungroup()

utterances_by_char <- friends %>% 
  group_by(season, episode, speaker) %>%
  summarise(total_utterances = n()) %>% 
  ungroup() %>% 
  filter(speaker == "Chandler Bing" | speaker == "Monica Geller" | speaker == "Joey Tribbiani" | speaker == "Phoebe Buffay" | speaker == "Ross Geller"| speaker == "Rachel Green")
  
smelly_cats_char <- left_join(utterances_by_char, smelly_cat)

What I found:

Speaking pattern changes over the avg season

Friends was on for so long, it is possible that there were fluctuations in how much characters spoke per season or even over the course of a season. I examined the average utterance length for each of the main 6 friends over the course of the average season and per season.

speaking_len <- friends %>% 
  mutate(utterance_length = str_length(text)) %>% 
  mutate(season = as.character(season)) %>%
  mutate(speaker = ifelse(speaker == "Chandler Bing" | speaker == "Monica Geller" | speaker == "Joey Tribbiani" | speaker == "Phoebe Buffay" | speaker == "Ross Geller"| speaker == "Rachel Green", speaker, "Other")) %>%
  group_by(episode, speaker) %>% 
  summarise(avg_utt_length = mean(utterance_length, na.rm = TRUE)) %>% 
  ungroup() %>% 
  filter(!is.na(speaker), speaker != "Other")

What I found:

Speaking pattern changes over the series

speaking_len <- friends %>% 
  mutate(utterance_length = str_length(text)) %>% 
  mutate(speaker = ifelse(speaker == "Chandler Bing" | speaker == "Monica Geller" | speaker == "Joey Tribbiani" | speaker == "Phoebe Buffay" | speaker == "Ross Geller"| speaker == "Rachel Green", speaker, "Other")) %>%
  group_by(season, speaker) %>% 
  summarise(avg_utt_length = mean(utterance_length, na.rm = TRUE)) %>% 
  ungroup() %>% 
  filter(!is.na(speaker), speaker != "Other")

What I found:

Oh Janice…

While not one of the main friends, Janice is a memorable character. She is Chandler’s ex girlfriend and is famous for her horrible sounding laugh.

Janice

I wanted to see who, among all the 6 friends was most often talking about Janice.

janice <- friends %>%
  mutate(Janices = str_count(text, "(?<=Janice)\\b")) %>%
  filter(speaker == "Chandler Bing" | speaker == "Monica Geller" | speaker == "Joey Tribbiani" | speaker == "Phoebe Buffay" | speaker == "Ross Geller"| speaker == "Rachel Green" | speaker == "Janice Litman Goralnik" ) %>% 
  group_by(speaker) %>% 
  summarise(total_janices = sum(Janices))

What I found:

Whose voice is most welcoming?

Finally, I was curious to see who is most often the first friend to speak in an episode.

first <- friends %>% 
  filter(speaker != "Scene Directions") %>% 
  group_by(season, episode) %>% 
  slice_min(order_by = utterance, n = 1) %>% 
  ungroup() %>%
  filter(scene == 1) %>%
  mutate(speaker = ifelse(speaker == "Chandler Bing" | speaker == "Monica Geller" | speaker == "Joey Tribbiani" | speaker == "Phoebe Buffay" | speaker == "Ross Geller"| speaker == "Rachel Green", speaker, "Other")) %>%
  group_by(speaker) %>% 
  summarise(firstspeaker_count = n())

What I found:

The Friends we made along the way…

I had a lot of fun with this project and liked learning how to use Regex!